Mon. Not. R. Astron. Soc. 000, [l]{T9] (2011) Printed 21 December 2011 (MN IATeX style file v2.2) 



Three-dimensional shapelets and an automated 
classification scheme for dark matter haloes* 

C.J. FlukeH, A.L. Malec^, P.D. Lasky^'^'^, B.R. Barsdelli 

^Centre for Astrophysics & Supercomputing, Swinburne University of Technology, PO Box 218, Hawthorn, Victoria, 3122, Australia 
^ Theoretical Astrophysics, Eberhard Karls University of Tiibingen, Tiibingen 72076, Germany 
^School of Physics, University of Melbourne, Parkville, VIC 3010, Australia 



Accepted 18 December 2011 



ABSTRACT 

We extend the two-dimensional Cartesian shapelet formalism to d-dimensions. Con- 
centrating on the three-dimensional case, we derive shapelet-based equations for the 
mass, centroid, root-mean-square radius, and components of the quadrupole moment 
and moment of inertia tensors. Using cosmological A^-body simulations as an applica- 
tion domain, we show that three-dimensional shapelets can be used to replicate the 
complex sub-structure of dark matter halos and demonstrate the basis of an auto- 
mated classification scheme for halo shapes. We investigate the shapelet decomposi- 
tion process from an algorithmic viewpoint, and consider opportunities for accelerat- 
ing the computation of shapelet-based representations using graphics processing units 
(CPUs). 

Key words: methods: data analysis - methods: analytical - (cosmology:) dark matter 
- (cosmology:) large-scale structure of Universe 



1 INTRODUCTION 

Complex, three-dimensional structures abound in astron- 
omy on all scales from "fluffy" dust aggregrates in molec- 
ular clouds (Ossenkopf 1993; Stepnik et al. 2003), to cos- 
mological large-scale structure that has been described as 
"sponge-like" (Gott, Dickinson & Melott 1986), or a "skele- 
ton" (Sousbie et al. 2008) of clusters, filaments and voids 
(Barrow, Bhavsar & Sonoda 1985; White et al. 1987). 

While aspects of these structures can be expressed in 
terms of simple, geometrically-motivated properties such as 
their triaxiality or quadrupole moment, these quantities are 
not able to capture the higher order complexity of the true 
shape. The challenge, therefore, is to provide an accurate 
description of an arbitrary three-dimensional (3-d) shape, 
possibly over many physical length scales, in the hope that 
this can lead to improved theoretical or analytical insight 
into the structure in question. 

The human visual system is more than capable of iden- 
tifying structures and sub-structures for an individual 3-d 
object, but such qualitative interpretations only have lim- 
ited use - it is not practical to attempt a classification of 
shapes by eye when there are many thousands of objects 



to inspect jj The preferred alternative is an automated ap- 
proach including: 

• decomposition via an appropriate basis set (e.g. Fourier 
analysis, wavelet transformations); 

• partitioning [e.g. Voronoi tesselation - see Icke & van 
de Weygaert (1987) for an early cosmological application]; 

• the use of minimal spanning trees to identify connected 
structures (Barrow et al. 1985; Pearson & Coles 1995); 

• Minkowski functionals [which return global geometric 
properties such as volume, surface area and edge density - 
Mecke, Buchert & Wagner (1994); Sahni, Sathyaprakash & 
Shandarin (1998)]; and 

• segementation [e.g. "dendrograms" used by Goodman 
et al. (2009) to identify self-gravitating structures in molec- 
ular clouds]. 

The approach we present in this paper is the exten- 
sion of the two-dimensional (2-d) shapelet method (Refregier 
2003) to three dimensions. Shapelets are sets of orthonormal 
basis functions based on the Hermite polynomial solutions 
of the quantum harmonic oscillator (QHO). Simple analytic 
forms can be derived for the physical properties of 3-d struc- 
tures (e.g. centre of mass, root-mean-square radius and the 
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components of the quadrupole moment and moment of inter- 
tia tensors), which can be efficiently calculated in shapelet 
space. 

In astronomy, 2-d shapelets have been applied to image 
simulation (Massey et al. 2004; Ferry et al. 2008), the mor- 
phological classification of galaxies (Kelly & McKay 2004; 
Andrae, Jahnke & Melchior 2011) and sunspots (Young et 
al. 2005), and weak gravitational lensing (Refregier & Ba- 
con 2003). The latter includes the measurement of shear 
(Kuijken 2006), flexion (Goldberg & Bacon 2005), point- 
spread function modelling and deconvolution (Melchior et 
al. 2009; Paulin-Henriksson, Referegier & Amara 2009), and 
weak lensing by large-scale structure from the FIRST ra- 
dio survey (Chang, Refregier & Helfand 2004). Massey et 
al. (2007) investigated weak lensing with polar shapelets 
(Massey & Refregier 2005) , a form more suitable for images 
with rotational symmetry. Further properties of shapelets, 
including integral relations and convolution sums are pre- 
sented in Coffey (2006). 

The importance of the shapelet approach lies not so 
much in the basis functions, but in the simplifed compu- 
tation of quantities relating to shape and structure that 
can be determined once a shapelet decomposition has been 
obtained. These analytic quantities are expressed as linear 
sums of weighted shapelet states, greatly reducing the cal- 
culation complexity compared to (numerically) solving the 
related integral formulations. 

Shapelet decomposition is not without its problems [see 
Berry, Hobson & Withington (2004) for an extensive discus- 
sion]. Melchior, Meneghetti & Bartelmann (2007) examined 
the limitations of shapelet image analysis in cases where the 
orthonormality condition [see equation Q below] fails, and 
proposed a decomposition procedure that preserves physi- 
cal properties of images. Melchior et al. (2010) and Bosch 
(2010) considered problems with using circular Gaussian ba- 
sis functions to model galaxies with high ellipticity or a large 
Sersic index. Ngan et al. (2009) proposed an alternative or- 
thonormal basis based on the Sersic profile (hence Sersiclets) 
for use in weak lensing analysis. While helping to avoid is- 
sues with poor shape recovery from overfitting low signal-to- 
noise galaxies, and fitting with too many degrees of freedom, 
Sersiclets do not possess the analytic properties of shapelets, 
and the basis functions must be generated numerically. In- 
deed, it is the existence of analytic functions that has mo- 
tivated our choice of 3-d Cartesian shapelets as an appro- 
priate tool for quantifying properties of three-dimensional 
structures. 

The remainder of this paper is set out as follows. In Sec- 
tion [2] we present the mathematics of 3- and d-dimensional 
Cartesian shapelets. New analytic expressions are presented 
for several important physical properties of 3-d structures 
in Section [3] In Section |4] we describe issues relating to 
implementing an efficient 3-d shapelet decomposition code. 
We highlight the inherent high-degree of paralellism in the 
shapelet decomposition algorithm, which makes it a promis- 
ing target for graphics processing units. In Section [5j we 
present first applications of 3-d shapelets to problems in 
cosmological simulations, with an emphasis on studying sub- 
structure in dark matter halos, demonstrating how an auto- 
mated shape classifier can work in shapelet space. We end 
with a summary and outlook for 3-d shapelets in astronomy 
in Section |6l 



2 CARTESIAN SHAPELETS 

In this section we present the Cartesian shapelet formalism. 
For full details of the one- and two-dimensional cases, and 
applications, see Refregier (2003). 



2.1 One-dimensional Cartesian shapelets 

The one-dimensional (1-d) shapelet functions are 

Br,(x-p) = r^'''<t>^{r^x), (1) 

where /3 is a scaling length, n is a non-negative integer and 



^2 



(2) 



with Hn (x) the n-th order Hermite polynomial. Higher order 
shapelets can be obtained using the recursion relation (see 
Appendix [A| for some useful expressions): 



Bn{x-P) = (^^^ y|B„_i(x;/3) - yi^B„_2(x;/?) (3) 



where 

Bo{x;P)=r'^\-'^*e-''"^^^" 
and 

B^{x;l3)^^Bo{x;P). 



(4) 



(5) 



The 1-d shapelets form an orthonormal basis, satisfying: 

/oo 
B„{x;P)BUx-p)dx = 5„r„ (6) 
-oo 

where Snm is the Kronecker delta symbol. Shapelets are 
smooth and continuously differentiable everywhere. The 
shapelet coeffecients for a sufficiently well-behaved 1-d func- 
tion, f{x), are found through the integral: 



/oc 
fix)B4x-l3)dx 
- OO 



(7) 



allowing the function to be re- written as a sum of (weighted) 
shapelets: 



(8) 



As we show in Section |4] the calculation of /„ poses the 
main computational challenge. In practice, n is limited to 
n ^ Umax and the integral of equation |[7| is calculated 
over a finite volume. However, the orthonormality condi- 
tion assumes infinite support - so power from higher order 
shapelets may be lost, and the orthonormality requirement 
may no longer strictly hold if the integration region is too 
small (Melchior et al. 2007). 

2.2 Three-dimensional Cartesian shapelets 

Using the orthonormality of 1-d shapelet functions, the basis 
functions for 2-d shapelets are (Refregier 2003): 

B2,r.(a3; P) = r^(j,2,r.{r'x). (9) 

where 

(j)2,n{x) = (I}„j^{xi)(f>n2{x2) (10) 
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Figure 1. Examples of three-dimensional Cartesian shapelets (/9 = 1). Top row: (left) n = (0,0,0); (right) n = (0,1,0). Bottom row: 
(left) n = (2,0,2); (right) n = (1,2,4). For each panel, we calculate the maximum data value, /max, and generate 10 equally spaced 
iso-surfaces over the range (—/max, /max). Individual isosurfaces are coloured with a two-ended intensity colour map: blue — black — 
orange. 



with X — {xi,X2) and n — (ni,n2). Consequently, the ex- 
tension to 3-d Cartesian shapelets is now almost trivial: 

Bs,r.ix; 13) = r'^'^hAr^x)- (11) 
where 

(l)'i^n{x) = 0ni(3;i)0n2(a^2)0n3(2;3) (12) 

with X = {x-i,X2,xz) and n = (711,712,713). The 3-d Carte- 
sian shapelet coefTecients have the form: 



h,r^^ f f3{x)B3,r.{x;P)d\ (13) 

with the integration occuring over the infinite volume of the 
domain, V, and the 3-dimensional shapelet decomposition 
is: 

oo 

h{x)= J2 f3,r.B3,r.{x;(3). (14) 
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We present examples of 3-d Cartesian shapelets in Fig. [T] 
using equally-spaced isosurfaces, and a two-ended intensity 
colour-map ranging from blue (negative values) to black 
(zero) to orange (positive values). 

Two further useful quantities are the characteristic ra- 
dius of a 3-d shapelet: 



and the size of small scale oscillatory features: 

e3,min~/3(^^„,ax + 3/2)-'''^ 



(15) 



(16) 



These expressions are based on well known quantum me- 
chanics results for the QHO, and are the 3-d versions of 
the expressions presented in Refregier (2003). They provide 
a starting point for determining appropriate decomposition 
parameters, as discussed in Section [4. 2| 



2.3 d-dimensional Cartesian shapelets 

It is straightforward to infer that the d-dimensional gener- 
alisation of the shapelet basis functions is: 



with 

d 



(17) 



(18) 



We can then write a general orthonormality condition: 

^ d 

/ Bd.nix; l3)Bd,m{x; I3)d''x = Y\_Snim.i, 
the shapelet coeffecients have the form: 



(19) 



/d,n = / fd{x)Bd^rt{x;P)A X 
Jv 

and the d-dimensional shapelet decomposition is: 

oc 

fd(x)= ^ fa,nBa,n{x;l3). 

ni,n2,...,n^=0 



(20) 



(21) 



In d-dimensions, the characteristic sizes are: 
ed,max ~ /3 (n,^ax + (22) 

and 

9d,min~ P{nrm.^ + d/2)-'/\ (23) 

We note that Coffey (2006) refers to the d-dimensional 
solutions of the harmonic oscillator, but does not present 
specific d-dimensional results in the form we use. 



3 ANALYTIC EXPRESSIONS 

Refregier (2003) demonstrates how analytic expressions can 
be obtained for common properties of 2-d images. We now 
derive analytic expressions for physical properties of 3-d 
structures using 3-d Cartesian shapelets, and their gener- 
alisation to d-dimensions. 



3.1 Zeroth moment 

The zeroth moment. Mo, of an arbitary (well-behaved) func- 
tion, /a (a;), in three dimensions is 



Mo = / f3ix)d-'x. 

' V 



(24) 



Writing this in terms of the shapelet coefficients, using equa- 
tion (14 1, and the orthonomality condition, equation ([6|: 



Mo = h,-n. / Bnida;i / B„^Ax2 / B^^Ax^ (25) 

ni, na.na ■' -°° 



"1 ,"2,1.1 



where 

Uni ,712 ,^^3 

and 



Til \ / n2 
ni/2 J I 712/2 



nzl2 



-I 1/2 



(26) 



(27) 



(28) 



are factors that recur in the analytic expressions to follow. 
We have used the integral property [see equation (17) of 
Refregier (2003)] for even n: 



J n 



n 
/2 



1/2 



(29) 



while for odd n, the integrals in equation ([25f vanish as 
B,i{x; j3) is an odd function. 

For applications in image processing, Refregier (2003) 
identifies total flux, F, with the 2-d zeroth moment. In 3-d, 
a more natural association might be made with total mass, 
M, for an object with density field, fd,{x) — p{x). 



3.2 Centroid 

The centroid position of a 3-d object is: 

Xif3{x)d^X 



Mo 



(30) 



for i = 1,2,3. The orthonormality condition enables us to 
write the series expansion as (for clarity, we only show results 
for xi): 



Xl 



^ /s.TiJnaJns J XlBr^^dXl. 



(31) 



7^^ ,7in ,n-> 



Using the recursion relation, equation (A3 1, and the fact 
that 



r '^d^.^o, 

J — c 



(32) 



dxi 

gives the intermediate result 

Xl = -r-— f 3, n J 712 J I VmTTB„ ^ + 1 da: 1 . (33) 

Mo J_oo 
ni,"2,"3 



With the notation introduced above, we have 

^3/4^5/2 °dd even 
Xl = — ^ ^ fs^nVni + IU77 1 ,n2,n3 + 1 ,712 

(34) 



Mo 

Til "2,13 

and similar results for £2 and £3. 
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3.3 Root-mean-square radius 

The root-mean-square (RMS) radius of a 3-d object is: 

JWo Jv 



(35) 



where x = \x\ = ^xf + X2 + x-^, gives an estimate of the 
physical extent of the object under investigation. Substitut- 
ing equation ( |14[ ) into the above, and using equation ||6|: 



2 

'"RMS 



^ even 

i E /a. 



Mo 



ni ,712 ,Ti3 



Jn2Jn3 / a^iBnido;! (36) 



From equation ( |A4[ ) and noting that 

d^ 

we have 

rRMS = J7 X] ■^3,n (m +n2 + n3 + 3/2) 



(37) 



Mo 

1 ,n2,n3 



"1 >n2 >"3 

"2, "3 ■ 



(38) 



3.4 Quadrupole moment tensor 

The quadrupole moment tensor is: 

Qij = / f-s{x) [SxiXj - x'^Sij) d^x, 
Jv 



(39) 



which is symmetric and traceless, so that there are only 
five independent elements. Performing the same calculations 
as in the previous section, the diagonal components of the 
quadrupole moment tensor are: 

even 

Qll = 2-K^^^l3'^^^ ^ /3,7x(2ni-n2-n3)t/„^,„2,n3W^ni,"2,i3-(40) 
ni ,712 ,713 

Q22 and Q33 have a similar form. The off-diagonal compo- 
nents are 

odd even 

Q12 = 3^3/4^7/2 J2 J2h.W{ni + l){n2 + l) (41) 
11,12 "3 

1+1,12+1,13 
and similarly for the other Qij with i ^ j. 



3.5 Moment of inertia tensor 

For the special case where f3{x) — p{x) represents a mass- 
density field, we can calculate the moments of interia. The 
moment of interia tensor describes all moments of interia 
of an object about different axes of rotation, usually calcu- 
lated with respect to the centre of mass of the object. In 
component form: 



/,:, = / f3{x){x'^5ij - XiXj)d^x. 

V 



(42) 



In coeffecient space, the diagonal elements of the interia ten- 
sor are: 

even 

7ii = 27r'/V^'' /3,"('^2+n3-fl)(7„i,„,,„3M/„i.„2,i3(43) 



and similarly for /22 and 133. The off-diagonal elements are 

odd even 

/12 = _^3/4^V2 ^ ^/3,„V(ni-fl)(n2 + l) (44) 



11 ,10 13 



,n2 ,^^3 ,12 +1 ,13 

and similarly for the remaining elements. 

3.6 Transformations 

Refregier (2003) demonstrates how shapelet coeffecients are 
modified under a general coordinate transformation in terms 
of a set of operators generating rotation, convergence, shear 
and translation. As we have not used the operator formula- 
tion explicitly elsewhere in the present work, we choose not 
to introduce this approach now. Instead, we treat simple co- 
ordinate transformations in terms of a modification of the 



integral in equation 1 13 1 



Consider an arbitrary (small) coordinate transforma- 



tion: 



cc' = (1 + *)a; -I- e 



(45) 



where e is a translation, and is a 3 x 3 transformation ma- 
trix. To obtain the shapelet coeffecients of the transformed 
input shape, /3^„, we must solve the integral: 



/3^„~ / /3(a;-*a;-e)B3,n(a:;/3)d'x 



(46) 



for each n, which is first order in 4". We introduce trans- 
formed coordinates, and a new set of shapelet basis func- 
tions. 



B^.r^ix; P) ^ B3,n(a;' - *a;' - e) 



(47) 



In general, the relevant integral expressions for transformed 
coordinates must be calculated numerically. We can gain 
insight into the effect of simple transformations by consid- 
ering the effect of translations and dilations on the shapelet 
ground state, B3,„=(o,o,o) (a:; /3). 



3. 6. 1 Translation 

The effect of a (small) translation, e = (ei,e2,e3), on the 
shapelet coefficients is: 



J 3,Tl 



/ f3{x')B3,n{x - e)d^x' 
Jv 



(48) 



As an example, we solve this for the n-tuples: (0,0,0), 
(1,0,0) and (2,0,0), to find: 



/3,n={0,0,0) 
/3^n = {l,0,0) 



= e 1' ^ e 
ei 



2/4/5 g-E3/4/3^ 



/3V2 



-4/ii3' -4/ii3' 



3,n={2,0,0) 



_!!_^g-^?/4/3%-Ei/4/3=g-el/4;3= 



/32y2 



(49) 
(50) 

(51) 
(52) 



As expected, shapelet power is transformed from the ground 
state to higher-order shapelet terms. In all cases, if any of 
the ei — 0, then the orthornormality condition, equation 
prevails. 
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3.6.2 Dilation 

Next, we consider a transformation that is a pure dilation: 



The centroid is: 

























K3 



(53) 



where all the <C 1. Transformed shapelet coeffecients 
are: 



(54) 



Using the ground state shapelet and the same n-tuples as 
previously, we find: 

/3';r.={o.o,o) = 2^/2[2 + ^i(2 + Aci)]-i/2 (55) 
X [2 + K2{2 + Ka)]"'/^ [2 + K3(2 + n^T^'^ 

fl-n.^(l,0,0) = (56) 
/3'^r.= {2,0,0) = 2Kli2 + Kl)[2 + Kl{2 + Kl)]-^/^ (57) 
X [2 + K2{2 + K2)]-^^^ [2 + K3(2 + K'i)]-^^^ 

Since the ground state is a symmetric shape, under a dila- 
tion, the odd shapelet coefficients vanish. 



3.6.3 Rotations 

The same approach can be used for rotations about the co- 
ordinate axes, which are defined in terms of the standard 
3x3 rotation matrices of the form: 



Ri( 



1 

COS Si —sin Si 
sin 9i COS 8i 



(58) 



and similarly for rotations about the a;2-axis, R2(S2), and 
a;3-axis, R3(6'3). A sequence of rotations can be combined 
into a single general rotation matrix, Ra,(0). The coordi- 
nate transformations for rotations are tractable but more 
complex algebraically than for translations and dilations - 



equations ( 48 1 and 1 54 1 . Rather than providing a general 
analytic form for the rotations, we instead demonstrate the 
resulting change in amplitude of shapelet coeffecients under 
an abitrary rotation in Section[5.2| in particular Figs. [6]|8] 



3.7 d-dimensional expressions 

We can use the results from the previous sub-sections to 
obtain analytic expressions in d-dimensions. The zeroth mo- 
ment is: 



where now 

i-^ 711 ,n2 , . ■ ■ — 

and 

1/2 

/ f) \ 



n V n,/2 



(59) 



(60) 



(61) 



Xl 



Mo 



(62) 



ni n2,...,nw 



,712 , . . . ,rid ^^^ni + I,n2,...,7id5 

and similarly for X2,...,Xd. Finally, with x — \x\ — 

^xf + X2 + . . . + x'^, we have the d-dimensional RMS ra- 
dius: 

2 _ 2^^/*/?2+''/2 

rRMS — (.^ni,...,UdVV„i, 

even 



(63) 



E -^d." (ni+n2 + ...+nd + - 



We do not attempt to derive d-dimensional equivalents 
of the quadrupole moment or moment of inertia tensors, 
as these are more natural quantities in three-dimensions. 
However, the generalised approach we have demonstrated 
can be applied to other properties defined as d-dimensional 
integrals of fd{x). 



4 IMPLEMENTATION ISSUES 

Before we can use the analytic expressions of Section [3] 
to study three-dimensional objects, we need to obtain the 
shapelet coefficients. In this section, we discuss some of the 
issues in implementing an effecient 3-d shapelet decomposi- 
tion code. 



4.1 Voxellation 

In applications to image simulation (Massey et al. 2004; 
Young et al. 2005) and gravitational lensing (Refregier & Ba- 
con 2003; Goldberg & Bacon 2005; Kuijken 2006), shapelet 
quantities are calculated for a pixel grid of image intensities, 
which is often obtained as a 'postage stamp' region selected 
from a larger image. For the 3-d case, we use a regular cubic 
mesh of voxels (volume elements). 

The discrete sampling of the 3-d structure onto a mesh 
means we need to integrate each shapelet term over the phys- 
ical size of a voxel, under the assumption that the data value 
in the voxel is constant. This is valid for data that is already 
on a grid (e.g. from a mesh-based simulation), and can be 
achieved for point-based data by smoothing to the grid with 
an appropriate smoothing scheme. 

For integration over a finite cubic volume, V, over spa- 
tial range Xmin to Xmax (and similarly for y and z), equation 



( 13 1 is replaced by a summation over Ng voxels: 



/3 



Ng.Ng,Ng 



Bn{x)A X 



(64) 



where our grid-based 3-d shape has a constant value in each 
voxel, fijk- The volume is assumed to be sufficiently large 
that the fijk — > outside of the integration region. 

Following Massey & Refregier (2005), the orthonormal- 
ity of shapelets means we can simplify the per-voxel inte- 
gration of the shapelet term as the product of three one- 
dimensional integrals of the form: 



/n(i) = / Br,{x)dx. 



(65) 
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where the index, 1 ^ i 5C Ng, specifies the one-dimensional 
voxel coordinate, and hence the integration limits on the 
boundaries of the ith voxel are: 



a = a^inin + (j — l)Ax 
b = a + Ax, 

with cell width 

* ^max ^min 



(66) 
(67) 

(68) 



This allows us to write equation ( 64 1 as a sum over all voxels: 

(69) 



JVg.JVg.iVg 

f3,n= ^ fijklni{i)ln2{j)j[n3{k) 



providing a set of shapelet coeffecients that are used to cal- 
culate the anal ytic quantities of Section [3| 
Equation ( 65 I has recursion solutioiia 



with 



/?7ri/2 
2 



erf 



I3V2 



h{i) = ~pV2[Bo{x)]l 



(70) 

(71) 
(72) 



4.2 Optimal decomposition 

A key problem is the choice of parameters, (/3, rimax, Xc), 
to perform an optimal shapelet decomposition. We use the 
notation to refer to the best-fitting object centroid, as op- 
posed to the shapelet reconstructed value, x. A good choice 
of parameters will ensure compact representation of the orig- 
inal data in coefficient space, while retaining high accuracy. 
Well chosen parameters will also exclude any noise that may 
be present in the data. As Melchior et al. (2007) highlighted 
for the 2-d case, shapelet decompositions may appear good 
visually, so it is important to define an appropriate goodness 
of fit, particularly as shapelet space can be highly degener- 
ate. 

The P parameter is the characteristic scale of the object 
to be decomposed. Increasing /3 has the effect of increasing 
the amplitude of the shapelets and dilating them along all 
coordinate axes. Changing the amplitude of the shapelets 
has no effect on the optimisation as the obtained coefficients 
simply scale in proportion to the change in amplitude, i.e. 
/3 is a one-dimensional spatial parameter. 

The maximum number of coefficients needed relates to 
the complexity of the data. A value of rimax that is too low 
will likely result in loss of information regarding the small- 
est features; if rimax is too high, noise and arbitrary high- 
frequency variations will be reproduced. Moreover, with in- 
creasing rimax, the range of P and Xc values that give vi- 
able solutions increases. This is because the additional co- 
efficients can compensate for a poor choice of P and Xc- It 
is therefore important that a minimum optimal rimax value 



^ There is an error in the factors of /3 in equation (32) of Massey 
& Refregier (2005), which is corrected in the arXiv version of their 



is used, while not resulting in significant loss of structural 
information, along with the optimal /3 and Xc values. 

To determine appropriate rimax and P values, we solve 



for the two unknown quantities in equations ( 15 I and 1 16 1 



and 



3, max "3, mill- 



(73) 



(74) 



Consider a voxel grid centred on the coordinate origin 
with major axis length, Xmax = — a;min, which is taken to 
be twice the maximum particle distance, ^3, max, from the 
coordinate origin. In this case, the cell width is: 

2x 

max 



Ax ■ 



Choosing ^3,min ~ Ax/2, it follows that 

TTmax = {Ng - 3)/2 

and 

P = a;max/\/27Vg. 



(75) 



(76) 



(77) 



For specific applications, convergence studies may be a more 
appropriate way to select initial estimates for rimax and P, 
and the size of the data 'padding' region. 

To minimise the number of evaluations, and avoid some 
of the issues of generating shapelet coeffecients with too 
many orders, we impose the constraint [see Section 3.1 of 
Refregier (2003)]: 



^ {ni+n2 + n-j) ^ 



(78) 



This constraint means that the total number of shapelet 
terms to be evaluted for a given rimax is: 



A^'cval = ^ (rimax + 1)( + 2) (rimax + 3). 



(79) 



This last equation is the d = S version of the more general 
result : 



^ nva.l — 



d 



to obtain the unique set of n values satisfying: 



^ rin 



(80) 



(81) 



paper: arXiv:astro-ph/0408445 



Coefficient-based measurements may produce inaccu- 
rate results in cases when the chosen rimax, P and Xc val- 
ues result in a reconstructed shape that is truncated by the 
bounding cube of the original data grid (in other words, 
when the reconstructed shape is bigger than the original 
data). Ensuring that the parameter bounds previously out- 
lined are not traversed, i.e. through the use of the padding 
region, will prevent this from occurring. Moreover, estimates 
of the X and r^^jg may fail if Mo = 0, since they depend 
on the reciprocal of the zeroth moment. This may occur for 
values of P that are too large. 

Further discussion of strategies for optimal shapelet de- 
composition are beyond the scope of this paper - see Massey 
& Refregier (2005) for an approach based on the steepest de- 
scent method. We now investigate the decomposition process 
from an algorithmic viewpoint, and consider opportunities 
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for accelerating the computation of shapelet coefTecients us- 
ing graphics processing units. 



4.3 Algorithmic considerations 

The algorithm for obtaining a shapelet decomposition for a 
voxellated structure is: 

(i) Choose the target grid resolution, Ng, and desired 
'^max, which constrain the initial choice of /3. 

(ii) Generate an array of shapelet amplitude estimates, 
/s.Ti, with A^evai entries (i.e. the minimum number that must 
be calculated), and initialise to zero- values. 

(iii) Calculate the 1-dimensional /„(i) terms for all orders 
up to nmax, resulting in UmaxNg stored values of In(i)- 

(iv) Loop over the elements of the n vector, subject to 



the constraint of equation (781, then 



(a) For each set of n values, loop over Ng cells with 
indices {i,j,k) and calculate the quantity: 

f3,r, /3,n + /ni {i) X (j) X I„s (fc) X fijk- (82) 

(v) Output the shapelet amplitudes for further processing 
and analysis. 

The process for reconstructing a three-dimensional 
shape from its shapelet coeffecients proceeds as follows: 

(i) Create an empty shape, fs{x), with dimensions Ng, 
and zero all fijk values. 

(ii) Loop over n vector, subject to constraint of equation 
( 78 1, calculating: 



Mx) ~ Mx) + B3,„{x;l3). 



(83) 



An optional filter can be applied in the reconstruction 
by only adding the contributions from shapelet terms where 
/s.Ti meets a prescribed criteria. Such an approach may be 
useful for removing noise, or to investigate the dependence 
of the analytic solutions on a particular shapelet order - see 
the example application in Section [5. 3[ 

Obtaining a shapelet decomposition of a voxellated 
structure involves computing equation ( |69[ ) for all A^evai coef- 
ficients. This computation is both very regular and abundant 
in inherent parallelism - two traits that suggest a strong 
suitability for implementation on many-core computing ar- 
chitectures such as graphics processing units (CPUs). 

CPUs were originally developed to accelerate the ren- 
dering of three-dimensional graphics through the use of a 
custom processor with a highly parallel architecture. CPUs 
are now capable of supporting general (i.e. non-graphics) 
computations through the use of software platforms such 
as the Compute Unified Device Architecture (CUDA) from 
NVIDI^I^or implementations of the OpenCljj standard. 

We can assess the suitability of the shapelet algorithm 
for a GPU implementation by using an algorithm analysis 
approach similar to that of Barsdell, Barnes & Fluke (2010), 
who noted that the most important considerations for an 
algorithm on a GPU are: massive parallelism, branching, 
arithmetic intensity and memory access patterns. To begin 



http:/ /www. nvidia.com/objcct/cuda_home_new. html 
* http://www.khronos.org/opcncl/ 



with, we assess the amount of parallelism in the shapelet de- 
composition problem. For simplicity, we assume that data is 
placed inside a bounding box such that all of the dimensions 
are the same - if there are fewer grid points along one axis, 
these must be zero-padded to the maximum grid scale. 



The computation of equation ( 69 1 involves a summa- 



tion over the three coordinate dimensions, i,j,k, for each 
shapelet coefficient defined by ni,n2,n3. The computation 
over ni, n2, na is therefore entirely (or emhar as singly) paral- 
lel, as each coefficient can be computed independently. The 
summation over voxels also exhibits inherent parallelism, 
but requires some coordination between elements. For this 
reason we will first consider parallelising the shapelet algo- 
rithm only over the shapelet coefficients, and will assume 
the summations are performed sequentially. 

Parallelising the problem over the shapelet coefficients 
defined by ni , n2 , na allows a maximum of A^evai paral- 
lel threads to work on the problem simultaneously. For 
"-max = 20, this is 1771 threads. While this is likely to exceed 
the number of physical processor cores in any current hard- 
ware architecture, modern GPUs often require an order of 
magnitude more threads in flight before their full potential 
is reached. It is therefore likely that some of the summation 
will need to be parallelised in addition to the evaluation of 
the shapelet coefficients. One such approach would be to 
compute sums over slices of the data volume in parallel, be- 
fore combining them in a second stage of computation. This 
would increase the number of threads by a factor of Ng, 
which would almost certainly saturate the available hard- 
ware performance. 

The next concern is branching, which occurs when par- 
allel threads execute differing instructions as a result of a 
conditional statement. Besides the application of the con- 
straint ni -I- 712 -I- na < "max, the shapelet decomposition 
algorithm does not require any branching operations. We 
therefore conclude that this factor will not significantly in- 
fluence performance on a GPU. 

Arithmetic intensity is the ratio of arithmetic opera- 
tions to memory-access operations. A high arithmetic in- 
tensity means that the CPU's instruction hardware will be 
fully utilised; a low intensity means that getting data from 
memory to the processors will be a bottle-neck and perfor- 
mance will be limited. The total input data to the shapelet 
algorithm scales as 0{Ng), while the computation scales 
as 0{NcviLiNg). This implies a very high theoretical peak 
arithmetic intensity of O(A'ovai) ~ 0(1000). This would be 
achieved by re-using input data fijk for the computation 
of many ni,n2,na values. Assuming such behaviour could 
be effected, the performance would be limited by the arith- 
metic throughput of the hardware, and we would expect to 
see very good performance on a CPU. 

In practice, the re-use of data is achieved through the 
exploitation of a cache, which is an area of very fast memory 
in which small amounts of data can be stored. On NVIDIA 
GPUs, the specific cache we refer to is known as shared mem- 
ory. By loading a block of fijk data into shared memory, 
threads can re-use the data multiple times before having to 
load another block in. As all of the operations on the in- 
put data fijk scale as 0{Ng), there is no difference between 
cacheing a block of any particular shape; for simplicity we 
therefore consider cacheing a simple one-dimensional block 
of data in the i dimension. In this setup, the j and k indices 
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can remain constant during the computation of the block, 
which aUows the value J„2 {j)in3 (k) to be pre-computed and 
stored locally before computation of the block begins. If, in 
addition, the value of ni is made to remain constant over the 
local group of threads, then the values fijklni (i) can be pre- 
computed and stored in shared memory. The computation 
by each thread of the block of i values then only involves the 
multiplication and accumulation of two pre-computed val- 
ues. Multiplication followed by addition also happens to be 
the fastest operation available on current GPU hardware. 

The last concern is the memory access pattern exhibited 
by the algorithm. Fortunately, the regularity of the compu- 
tation means that data are typically accessed in an aligned 
and contiguous fashion, and there should therefore be no 
issues in achieving a high memory throughput. 

To reduce the computational overhead in the evalua- 



tion of equation (691, the integrals I„ can be pre-computed 



once for each input shape and stored in look-up tables. The 



recursion relation, equation (701, makes it practical to eval 



uate and store all shapelet orders up to rimax for Ng grid 
cells along one dimension. This involves only O(nmaxA^s) 
terms, and could be computed on the CPU without impact- 
ing on the overall performance of the algorithm. A further 
advantage of using the recursion relations is that sufficient 
numerical precision can be maintained, even for high n val- 
ues. If we calculated each shapelet term independently from 
equation then the pre-factors, (2"n\)~ '"^ tend to zero 
very rapidly, and for n > 30, cannot be stored sufficiently 
accurately in single precision. This requirement is of rele- 
vance to GPU implementation, as the greatest processing 
speed-ups offered by the current generation of GPUs is for 
single precision, and reduces the overall memory required by 
storing as 32-bit rather than 64-bit values. 

Given the strong degree of parallelism exhibited by the 
algorithm, the ability to efficiently cache the input data and 
take advantage of a very high arithmetic intensity, the ability 
to pre-compute the shapelet integral terms, and the fact that 
the core of the algorithm can be reduced to simple multiply- 
add operations, we conclude that an implementation of the 
shapelet decomposition algorithm on a GPU would likely 
achieve a level of performance very near the peak capabil- 
ity of the hardware. Shapelet decomposition thus stands to 
benefit significantly from current trends in commodity com- 
puting hardware, and may have an additional advantage 
over related methods that are unable to take advantage of 
massively-parallel architectures. 

The extension of the above algorithm analysis to d- 
dimensional shapelet decompositions should be straightfor- 
ward, and we expect the conclusions to remain unchanged; 
however, implementation complexity is likely to increase, 
particularly in the general case. 

For the application domain we now explore, viz. 3-d 
Cartesian shapelet representations of simulated dark mat- 
ter haloes, we have used a CPU-only implementation of the 
decomposition algorithm. 



5 THE SHAPES OF DARK MATTER HALOES 

If the only use of the shapelet approach was to calculate the 
analytic expressions of Section 3, then it would be a some- 
what ineffecient one, compared to direct numerical integra- 
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Figure 2. To select an appropriate /3scalc for classification of halo 
shapes, we calculate the average peak signal-to-noise ratio, {Pq), 
over 200 input haloes (markers; black solid line). Dashed lines 
represent the one-standard deviation error range. On the basis of 
this analysis, we choose /3scalc = 0.8, which presents a reasonable 
comprimise to a full optimisation process. 



tion of equations ( 24 1 , ( 30 1 , ( 35 I , ( 39 1 , and ( 42 1 . The benefit 



of the shapelet decomposition is that we now have additional 
information concerning the shape. Opportunities for clas- 
sifying three-dimensional structures based on the shapelet 
terms may be made through identification of the dominant 
shapelet terms, or by investigating relative weights of par- 
ticular shapelet orders. In this section, we demonstrate how 
three-dimensional shapelet analysis of dark matter halos 
suggests a new method for automatically classifying halo 
types. 

5.1 Shapes and sub-structure 

For some time, it has been known that Cold Dark Matter 
(CDM) cosmologies predict the formation of triaxial haloes 
(on average), with a slight preference for prolate haloes over 
oblate ones (Davis et al. 1985; Barnes & Efstathiou 1987 
Frenk et al. 1988; Dubinski & Carlberg 1991; Dubinski 1994 
Cole & Lacey 1996; Jing & Suto 2002; Kasun & Evrard 2005 
Bailin & Steinmatz 2005; Oguri et al. 2005; AUgood et al. 
2006; Knebe & Wiel3ner 2006; Kuhlen, Diemand & Madau 
2007). These studies include meeisuring the distribution of 
halo triaxalities, studying the effects of baryons (which tend 
to reduce the triaxiality compared to dark matter only mod- 
els), and investigating the relationships between halo shapes 
and angular momentum. 

The purely triaxial treatment of dark matter haloes 
overlooks another well-established result from CDM simula- 
tions: individual haloes do not have a smooth density profile 
- they contain sub-structure (Lacey & Cole 1993; Moore et 
al. 1999; Ghigna et al. 2000). While the triaxial nature of 
dark matter haloes can be expressed empirically (e.g. Jing 
& Suto 2002), quantifying the sub-structure remains a chal- 
lenge. A shapelet-space representation of dark matter haloes 
provides a potential solution. 
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Figure 3. The 12 most massive haloes from most massive (A; top left) to least massive (L; bottom right). Each panel comprises (left) 
input dark matter halo and (right) shapelet reconstructed halo, displayed as volume renderings of the logarithmic density. The coordinate 
ranges in each panel are not equal, but have been selected for clarity based on Ax for each halo. Shapelet parameters were Ng = 51, 
"max = 24, and j3 values arc in Table [T] The strong similarity between the input and shapelet reconstructed versions is apparent. 



To demonstrate our approach, we use a sample of 200 
candidate dark matter haloes selected from a cosmological 
A'^-body simulation performed with GADGET-2 (Springel 
2005). The cosmological parameters were f2o = 0.27, Ao = 
0.73, h = 0.71 and as = 0.9, and candidate haloes were iden- 
tified using the SubFind groupfinder (Springel et al. 2001). 

Using particle number, A'p, as a proxy for mass, we 
pay particular attention to the twelve most massive haloes, 
haloes A-L, and the twelve least massive, haloes M-X, from 
the sample. We consider these two-subsets as being represen- 
tative of typical halo shapes and presence of sub-structure, 
along with limiting any mass-dependent biases that may oc- 
cur. For each halo, the triaxality, T, is calculated using the 
approach described in Appendix [Bj and tabulated in Table 
[l] Further quantities presented in this table are described 
below. 

Of the twelve 'heavy' haloes, two are oblate (T 5C 1/3), 
eight are prolate (T ^ 2/3) and two are triaxial (1/3 < T < 
2/3). Both the oblate haloes (A and L) have clear central 
cores, while the triaxial haloes (D and J) do not possess 
such a core. None of the 'light' haloes are oblate, ten were 
prolate, and two were triaxial (this time, haloes with central 
cores) . 

We perform a three-dimensional shapelet decomposi- 
tion on each halo, with the following input parameters fixed: 

X = 24 and /3 = ^flscalca^max/ 



appropriate /Sacaio for classification of halo shapes, we define 
a fitness estimator in terms of the peak signal-to-noise ratio: 



= 20 1 



Max(/,.,fc) 



(84) 



where Majc(/ijfe) is the maximum value in the volume, and 
the mean-square error is: 



Ms = 



1 



E 



fijk fijk\ 



(85) 



We calculate the average peak signal-to-noise ratio, (Ps), 
over 200 input haloes (Figure |2] - markers; black solid 
line); dashed lines represent the one-standard deviation error 
range. On the basis of this analysis, we choose /3acaie = 0.8 
as providing the best fit to the input halo shapes, presenting 
a reasonable comprimise to a full optimisation process. 

To avoid orientation-dependent effects, haloes are ro- 
tated such that their principle axes are aligned with the 
coordinate axes (see Appendix [b]| . Halo particles are then 
smoothed to a grid using the triangle-shaped cloud smooth- 
ing strategy, providing number counts per voxel, which is 
equivalent to a density, Pijk- To deal with the large dynamic 
range in Pijk, the input shape is actually: 



Ng = 51, 



2A'^£,. To select an 



fijk = logio (1 + Pijk) . 



(86) 
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Figure 4. The 12 least massive haloes from most massive (M; top left) to least massive (X; bottom right). Each panel comprises (left) 
input dark matter halo and (right) shapelet reconstructed halo, displayed as volume renderings of the logarithmic density. The coordinate 
ranges in each panel are not equal, but have been selected for clarity based on Ax for each halo. Shapelet parameters were Ng = 51, 
"max = 24, and /3 values are in Table [T] The strong similarity between the input and shapelet reconstructed versions is apparent. 



Since each halo has a difTerent mass and hence physi- 
cal extent, the j3 value for each halo is different - see Ta- 
ble [l] The other columns in this table are: the cell-width, 
Ax, as defined in equation (751; the maximum voxel value 
from the input shape, /max, and the minimum and max- 
imum shapelet-recovered values, Smin and S'max, respec- 
tively. To enable quantitative comparisons between the in- 
put and reconstructed shapes we compute the quantitites 
S/ = Yli,j,k fijk ^iid Ss = Y.i,j,kfijk, and Ps. Numerical 
testing, where reconstructions were optimised by hand, sug- 
gested that E/ ~ Es and Ps 45 (Figure [2| represented a 
good shapelet fit for the grid resolution used. 

The m-th most dominant shapelet component of the re- 
construction has n = Dm, with amplitude fa^nm ~ /m.max. 
Except where indicated, Di = (0, 0, 0), so we also report the 
value of D2- The final two columns of Table [l] represent the 
result of 'by-eye' classifications of the spatial characteristics 
of each halo, C{I), and the shapelet profiles, C{S), into the 
three halo classes - see Section fS. 21 below. 

Figs. [3] and |4] show the results of the shapelet decom- 
position. For each halo, the left-hand panel shows the input 
shape, and the right-hand panel is reconstructed in shapelet 
space. Each image pair presents two-dimensional projections 
of fully three-dimensional, volume rendered structures. Vi- 
sual comparsion of pairs of images suggests that, qualita- 
tively, Cartesian shapelets represent an appropriate basis 



set for decomposition of dark matter haloes. Quantitatively, 
we find that: 



j ('5'max '5'niin) //ma 

|Es/E/-lj ^ 1%, 



11 sC 6%, 



(87) 
(88) 



and Ps ^ 45, so that even without a halo-specific optimisi- 
ation, there is excellent agreement between the input halo 
and its Cartesian shapelet reconstruction. 



5.2 Towards an automated shape classifer 

The 3-d shapelet approach provides a means to check the 
outcome of halo finding algorithms by identifying classes 
in shapelet space without needing to visually inspect en- 
tire halo-candidate catalogues. For 21 of the 24 haloes in 
Table [l] the dominant component is Di = (0, 0, 0) - in 
most cases, the zeroth-order shape has a high amplitude, 
which is not unexpected for haloes centred on the coor- 
dinate origin. Three haloes (I, S and V), however, receive 
their maximal contribution from a higher-order shapelet, 
Di = (ni, 0, 0), 711 ^ 1. We can use information on the rela- 
tive contributions of shapelet orders higher than the zeroth 
order term to enable a shapelet-based classification of dark 
matter halo shapes. 

Fig. [5] shows three characteristic patterns in shapelet 
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Table 1. Summary of halo properties and shapelet decomposition parameters for the sample of 24 dark matter haloes, classified into 
heavy (A-L) and light (M-X) samples. Table columns are: halo identifier; number of particles in halo, Np, a proxy for halo mass; halo 
triaxiality, T, determined from particle positions; scale parameter, /3, used for decomposition; the cell-width. Ax, as defined in equation 
| |75[ l; the maximum voxel values from the input shape, /max, and the shapelet-recovered minimum and maximum voxel values, Smin and 
Smax; and E/ = ^ j. f^j^ and = X]i j k fijk are used to characterise the recovered shapes, along with the peak signal-to-noise, 
Ps- The m-th most dominant shapelet component has n = Dm, with amplitude fs Dm ~ /m.max, and m = 1,2,...; except where 
indicated, Di = (0,0,0), so we also present results for D2- Halo classes are assigned 'by eye' through inspection of the real-space, Ci, 
and shapelet-space, Cg, representations - see Section |5.2| for details. 
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space, consistent with the general appearance of the haloes 
in Figs. [3] and |4] For each halo, all amplitudes are plotted 
in index order, with value varying most rapidly, then 722, 
and finally n\. The light grey vertical lines indicate values 
of (ni, 0, 0) where m = 0, 1, . . . , rimax; for rimax = 24, there 
are A'evai = 2925 shapelet coefficients. Shapelet amplitudes, 
represented by vertical black line segments, are plotted as 
Wn = ./'3,Ti//i,max, with the six most-dominant shapelet or- 
ders numbered and coloured red. We propose the following 
three classes: 

• Class 1: Halo T (top panel) has a central core, but no 
significant sub-structure. In shapelet space, it is dominated 
by the zeroth-order shapelet, with low amplitudes for higher 
orders. 

• Class 2: Halo D (middle panel) has a central core, and 
obvious sub-structure. Here, the zeroth-order shapelet again 
dominates, but there are several higher order shapelets with 
amplitudes < |/i,max. 

• Class 3: Halo I (bottom panel) has significant sub- 
structure and no central core. The zeroth shapelet is no 
longer always the dominant term, and there are several 
shapelet orders with amplitudes ~ /i,max. 

The initial alignment of each halo with the a;-axis is ap- 
parent, with obvious contributions from shapelet orders 
n = (ni, 0, 0). 

The fiexibility of the classification system is demon- 



strated in Figs. |6|8| We select three new intermediate mass 
haloes: Haloes 100, 101 and 102 (specific properties are 
listed in the captions). Performing shapelet decomposition 
on these haloes with arbitrary three-dimensional rotations, 
thus removing the alignment of the principle moments of in- 
ertia with the coordinate axes, we see that the basic features 
of the three shapelet classes remain. Haloes 100 and 101, 
with /i,max occuring for the zeroth-order shapelet, retain 
this behaviour, while the power in higher shapelet orders is 
distributed away from the (ni, 0, 0) values. This is not unex- 
pected from the behaviour of 2-d shapelets under rotations 
(see Refregier 2003). Rotation of Halo 102 (Fig. |8|, with 
two clear components, results in variation in the highest- 
amplitude shapelet coeffecients, suggesting the following fea- 
tures for identification of haloes of this type: either /i,max oc- 
curs for a shaplet order other than the zeroth-order, or there 
are one or more shapelet orders with amplitudes > 0.5/i,max- 

We use this heuristic to now attempt a purely (by-eye) 
shapelet-based selection of haloes with clear multiple sub- 
structures (Class 3). We apply the shapelet decomposition 
with the same input parameters as used throughout this ini- 
tial implementation, to a total of 176 haloes. Particle counts 
for this new set of haloes are in the range 865 55 A'p ^ 20033, 
noting that these haloes are at intermediate masses to the 
24 investigated previously. We identify 44 Class 3 haloes 
on the basis of their shapelet representation, the first 36 
of which are shown in Fig. |9j and all of which exhibit the 
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Figure 5. Three characteristic shapelet-space representations of dark matter haloes. Shapelet coefficient amplitudes are plotted in 
index order, with value varying most rapidly, then 712, and finally ni - the light grey vertical lines indicate values of (ni,0,0) 
with ni = 0, 1, . . . , riniax- Shapelet amplitudes, represented by vertical black line segments, are 103,71 = /3,n.//i,max, with the six most 
dominant shapelet orders numbered and coloured red. (Top) Halo T, Class 1 - central core, no significant sub-structure — dominated by 
zeroth-order shapelet, low-amplitude for higher orders. (Middle) Halo D, Class 2 - central core, significant sub-structure — /max occurs 
at Ti = (0,0,0) and several higher order shapelets have amplitudes ~ |/max. Bottom) Halo I, Class 3 — significant sub-structure, no 
central core - zeroth-order shapelet is not the dominant term (although in other Class 3 haloes, it can still be dominant), several orders 
with amplitudes ~ /max- 



expected spatial characteristics. Visual inspection of the re- 
maining 132 haloes suggests a futher 10 haloes that should 
have been identified from their shapelet representations. In 
all cases, reinvestigation of the shapelet distribution revealed 
that they were very close to meeting the criteria for a Class 
3-halo. While a more robust approach to classification is re- 
quired for a full implementation (e.g. using an appropriately- 
sized training set and the construction of a decision tree or 
neural network classifier), our results do suggest that there 
is benefit to performing classification of dark matter halo 
shapes in shapelet space. 

The existence of multiple cores in the dark matter 
haloes has implications for computation of halo triaxiality - 
the classification of 'heavy' haloes E, F, I, K as prolate based 



purely on the principle moments of inertia is somewhat mis- 
leading; in each case, an argument could be made that an 
isolated group has not been identified using, in this case, 
the Subfind algorithm. Our application to 176 haloes identi- 
fies 44(-|-10) prolate haloes where the inferred triaxality was 
based on counting potentially distinguishable sub-haloes as 
a single halo. A shapelet-based automated classifier provides 
a method of identifying such haloes without needing to vi- 
sually inspect each halo. 



5.3 Shapelet-based quantification 

As an example of quantitative analysis in shapelet space, 
we calculate the moment of inertia tensors from equation 
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Figure 6. Halo 100, Np = 2068, T = 0.80 (prolate), Class 2. The left-hand column shows four real-space configurations of the halo, 
with arbitrary rotations about the centre-of-mass. The right-hand column shows the corresponding shapelet-space configuration; Wn = 
/3,Ti//i.max, and the horizonal axis represents the sequential coefficients, n — see Section |5.2| General properties of the distribution of 
shapelet amplitudes are preserved regardless of orientation. 



( 43 \-( 45 1 and hence triaxiality, T. In Fig. 10 we plot the 



shapelet-based triaxiaUty, Tg, against the value determined 
from the original particle positions, T, for the sample of 176 
intermediate mass haloes. A least-squares fit to the data 
(solid line) gives Is = 0.96T + 0.02 with the Pearson co- 
effecient, r = 0.98. In Fig. [TT] we plot the ratio of Ts /T 



against the particle number, Np, which suggests that there 
is a slightly larger scatter for the lower mass haloes. We find 
that {Ts/T) = 0.99 ± 0.12, where the error is the sample 
standard deviation. Even though we have not performed a 
per-halo optimisation for (/3, rimax, Xc), the shapelet-based 
analytic result does indeed provide a very good estimator 
for the halo triaxiality. 



6 SUMMARY AND OUTLOOK 

We have extended the two-dimensional Cartesian shapelet 
formalism of Refregier (2003) to three dimensions, deriv- 
ing analytic expressions for the zeroth moment, object cen- 
troid, root-mean-square radius, and the components of the 
quadrupole moment and moment of inertia tensors. We also 
presented generalisations to d-dimensions. 

Further work is necessary to develop a robust and sys- 
tematic optimisation strategy for the decomposition pa- 
rameters, and the development of specfic applications for 
the three-dimensional shapelet technique requires such a 
strategy. There are also opportunities to develop the for- 
malism further, specifically extending it to include spher- 
ical shapelet functions [c.f. the alternative presentation of 
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Figure 7. Halo 101, Np = 204:7 , T = 0.92 (prolate), Class 1. The left-hand column shows four real-space configurations of the halo, 
with arbitrary rotations about the centre-of-mass. The right-hand column shows the corresponding shapelet-space configuration; Wn = 
/3,Ti//i.max, and the horizonal axis represents the sequential coefficients, n — see Section |5.2| General properties of the distribution of 
shapelet amplitudes are preserved regardless of orientation. 



two-dimensional Cartesian shapelets as polar shapelets by 
Massey & Refregier (2005)]. 

The shapelet decomposition algorithm exhibits at- 
tributes that make it an ideal target for implementation on 
modern, massively-parallel CPUs. Our algorithm analysis 
demonstrates that the computation is entirely (or embarass- 
ingly) parallel; has minimal or no branching; maintains a 
high ratio of arithmetic operations to memory-access oper- 
ations; and has a memory access pattern that will result in 
aligned or contiguous access to memory, required for achiev- 
ing a high memory throughput. With our proposed scheme 
of precomputing shapelet voxel-integral terms, the computa- 
tion reduces to a parallel series of multiply-add operations, 
which are almost ideal for CPUs - we anticipate achieving 
close to peak processing performance. Significantly reduc- 
ing the computation time for the shapelet decomposition. 



compared to CPU, means that more processing time is then 
available for optimisation. 

As an example application, we have demonstrated how 
three-dimensional shapelets can be used to study the com- 
plex sub-structures of dark matter haloes from cosmological 
Ai'-body simulations, including providing an alternative ap- 
proach to classifying the properties of haloes. Our prelimi- 
nary investigation suggests that halo triaxiality measured 
purely from the moment of inertia tensor may be incor- 
rect due to limitations of group finders that are not able 
to separate out what may be truly distinct sub-clumps. Im- 
provements to our current 'by eye' approach to classification 
could include development of a decision tree or neural net- 
work classifier, or the use of principle component analysis to 
significantly reduce the number of shapelet terms required 
for classification (Kelly & McKay 2004). 
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Figure 8. Halo 102, Np = 2005, T = 0.91 (prolate), Class 3. The left-hand column shows four real-space configurations of the halo, 
with arbitrary rotations about the centre-of-mass; the right-hand column shows the corresponding shapelet-space configuration; Wn = 
/3,Ti//i.max, and the horizonal axis represents the sequential coefficients, n — see Section |5.2| General properties of the distribution of 
shapelet amplitudes are preserved regardless of orientation. 



The shapelet formalism is virtually unexplored in the 
three-dimensional domain, ofTering opportunities for the fur- 
ther development of a methodology that can be used to 
quantify and analyse complex three-dimensional structures. 
Future applications of the three-dimensional shapelet te- 
chinique may include classification and parameterisation of 
sources identified in Hi spectral line data cubes; studying the 
shapes of voids in cosmological simulations (by considering 
an inverted density field); and the possibility to generate 
mock dark matter haloes through an extensive study of the 
distribution of shapelet amplitudes as a function of mass 
and triaxiality. 
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APPENDIX A: HERMITE POLYNOMIALS 

We collect here a number of key expressions relating to Her- 
mite polynomials, which prove useful in deriving the analytic 
properties of Section [3] 

Expressing the Hermite polynomials via the Rodrigues 
formula 

^^-W^Mre^'l^rK). (Al) 

one can show the important recursion relation 

H„+i{x) = 2xH„{x)-^^^^, (A2) 
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which further implies the shaplet basis functions satisfy: 
v/2(n + ^ (^x - -^^ B„{x- /3) (A3) 



and 



(^x^ - Bn(x;/3) = (2n + l)/3^B„(x;/3), (A4) 

which is the eigenvalue equation. Calculating the derivative 
terms, we have the further recurrence relations: 



Hn+i{x) = 2xH4x) ~ 2nH„-i{x) 



(A5) 



APPENDIX B: 
ROTATIONS 



TRIAXIALITY AND HALO 



Triaxiality of a dark matter halo is most easily expressed 
in terms of the principle moments of inertia. The princi- 
ple moments, {Ii, I2, 13), and the associated principle axes, 
(ei, £2, 63), are the eigenvalues and eigenvectors of the mo- 
ment of inertia tensor, /, respectively. Approximating an 
arbitrary halo as a triaxial ellipsoid of the form 



2 2 
a2 ^ 62 



+ 



= 1 



with c ^ b ^ a, then 

M . 



h = 



M 2 , 2^ 

— {a +c ) 

M , 2 ,2x 



(Bl) 

(B2) 
(B3) 
(B4) 



and Ii ^ I2 ^ la- Moreover, we can calculate the triaxiality 
parameter (Franx, Illingworth & de Zeeuw 1991): 



h-h 



(B5) 



enabling us to classify haloes as oblate (T ^ 1/3), triaxial 
(1/3 < T < 2/3), or prolate (T ^ 2/3). We define a sphere 
(a = 6 = c) to have T = 0. 

Since we have full information on particle positions, 
{x,y,z), from the cosmological simulation, we make use of 
this to simplify the computation of /. Specifically, we com- 
pute elements of / from particle positions, using standard 
eigensystem routines from the GNU Scientific Librarjj^ to 
solve for the principle eigenvectors and eigenvalues. To en- 
able comparisons between halos, we rotate each halo so that 
its principle axes are aligned with the Cartesian axes. First, 
we define an orthogonal coordinate system with unit vector 
directions 61,62 and 



n — Bl 



I 62. 



(B6) 



Although 63 is orthogonal to 61 and 62, explictly calculat- 
ing the cross-product ensures that we have a right-handed 
coordinate system. The Euler angles are: 

9i = sin^'(-62,.) (B7) 
e»2 = tan"^ (ei,z/n^) (B8) 
6*3 = tan"' (62,:r/2j . (B9) 



We use the standard C-function atan2, which returns the 
principle value tan~'(i//a;), for calculating 62 and 63, and is 
recommended for converting between rectangular and polar 
coordinates. 

Next, we build a general 3x3 rotation matrix: 



CyCz 4^ SxSySz 
CySz -1" SxSyCz 
CxSy 



CxSz 
CxCz 
Sx 



SyCz -f" SxCySz 

SySz ~\~ SxCyCz 

CxCy 



, (BIO) 



where Cx = cos{6x), Sx = sin(6ly) and similarly for y and z, 
from which we can determine the inverse matrix, 
easily via the transpose, R"^, and the adjunct matrix of R 
Each particle position, p, in the halo is now rotated around 
the origin to new coordinates: 



most 

T 



P = 



R V 



(Bll) 



http : //www . gnu . org/sof tware/gsl/manual/html_node/Eig6nsystems . html 
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